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Abstract —We propose a sampling theory for signals that are 
supported on either directed or undirected graphs. The theory 
follows the same paradigm as classical sampling theory. We show 
that perfect recovery is possible for graph signals bandlimited 
under the graph Fourier transform. The sampled signal coef¬ 
ficients form a new graph signal, whose corresponding graph 
structure preserves the first-order difference of the original graph 
signal. For general graphs, an optimal sampling operator based on 
experimentally designed sampling is proposed to guarantee perfect 
recovery and robustness to noise; for graphs whose graph Fourier 
transforms are frames with maximal robustness to erasures 
as well as for Erdos-Renyi graphs, random sampling leads to 
perfect recovery with high probability. We further establish the 
connection to the sampling theory of finite discrete-time signal 
processing and previous work on signal recovery on graphs. 
To handle full-band graph signals, we propose a graph filter 
bank based on sampling theory on graphs. Finally, we apply 
the proposed sampling theory to semi-supervised classification on 
online blogs and digit images, where we achieve similar or better 
performance with fewer labeled samples compared to previous 
work. 

Index Terms —Discrete signal processing on graphs, sampling 
theory, experimentally designed sampling, compressed sensing 


I. Introduction 

With the explosive growth of information and communi¬ 
cation, signals are generated at an unprecedented rate from 
various sources, including social, citation, biological, and phys¬ 
ical infrastructure CD, 1 2 1, among others. Unlike time-series 
signals or images, these signals possess complex, irregular 
structure, which requires novel processing techniques leading 
to the emerging field of signal processing on graphs 0, El. 

Signal processing on graphs extends classical discrete signal 
processing to signals with an underlying complex, irregular 
structure. The framework models that underlying structure by 
a graph and signals by graph signals, generalizing concepts and 
tools from classical discrete signal processing to graph signal 
processing. Recent work includes graph-based filtering 0 , 
0, q, graph-based transforms 0 , 0 , 0 , sampling and 
interpolation on graphs flOl . fTTL H2l . uncertainty principle 
on graphs fT3lL semi-supervised classification on graphs fUfll . 
lTi31 . im graph dictionary learning oa, m, denoising ii, 
119], community detection and clustering on graphs f2()l , |2TI, 
(22i graph signal recovery [ 23] , (24) . [25] and distributed 
algorithms (26), (27) . 
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Two basic approaches to signal processing on graphs have 
been considered: The first is rooted in the spectral graph theory 
and builds upon the graph Laplacian matrix 0. Since the 
standard graph Laplacian matrix is restricted to be symmetric 
and positive semi-definite, this approach is applicable only to 
undirected graphs with real and nonnegative edge weights. 
The second approach, discrete signal processing on graphs 
(dsp g ) 0 , m, is rooted in the algebraic signal processing 
theory (29), ED and builds upon the graph shift operator, 
which works as the elementary operator that generates all linear 
shift-invariant filters for signals with a given structure. The 
graph shift operator is the adjacency matrix and represents the 
relational dependencies between each pair of nodes. Since the 
graph shift is not restricted to be symmetric, the corresponding 
framework is applicable to arbitrary graphs, those with undi¬ 
rected or directed edges, with real or complex, nonnegative 
or negative weights. Both frameworks analyze signals with 
complex, irregular structure, generalizing a series of concepts 
and tools from classical signal processing, such as graph filters, 
graph Fourier transform, to diverse graph-based applications. 

In this paper, we consider the classical signal processing 
task of sampling and interpolation within the framework of 
dsp g ED, ED. As the bridge connecting sequences and 
functions, classical sampling theory shows that a bandlimited 
function can be perfectly recovered from its sampled sequence 
if the sampling rate is high enough 1331 . More generally, 
we can treat any decrease in dimension via a linear operator 
as sampling, and, conversely, any increase in dimension via 
a linear operator as interpolation ED, El- Formulating a 
sampling theory in this context is equivalent to moving between 
higher- and lower-dimensional spaces. 

A sampling theory for graphs has interesting applications. 
For example, given a graph representing friendship connectivity 
on Facebook, we can sample a fraction of users and query 
their hobbies and then recover all users’ hobbies. The task 
of sampling on graphs is, however, not well understood ED, 
E2), because graph signals lie on complex, irregular structures. 
It is even more challenging to find a graph structure that is 
associated with the sampled signal coefficients; in the Facebook 
example, we sample a small fraction of users and an associated 
graph structure would allow us to infer new connectivity 
between those sampled users, even when they are not directly 
connected in the original graph. 

Previous works on sampling theory iflol . El, E3 consider 
graph signals that are uniquely sampled onto a given subset 
of nodes. This approach is hard to apply to directed graphs. 
It also does not explain which graph structure supports these 
sampled coefficients. 

In this paper, we propose a sampling theory for signals that 
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are supported on either directed or undirected graphs. Perfect 
recovery is possible for graph signals bandlimited under the 
graph Fourier transform. We also show that the sampled signal 
coefficients form a new graph signal whose corresponding 
graph structure is constructed from the original graph structure. 
The proposed sampling theory follows Chapter 5 from m and 
is consistent with classical sampling theory. 

We call a sampling operator that leads to perfect recovery a 
qualified sampling operator. We show that for general graphs, 
an optimal sampling operator based on experimentally designed 
sampling is proposed to guarantee perfect recovery and ro¬ 
bustness to noise; for graphs whose graph Fourier transforms 
are frames with maximal robustness to erasures as well as for 
Erdos-Renyi graphs, random sampling leads to perfect recovery 
with high probability. We further establish the connection to 
sampling theory of finite discrete-time signal processing and 
previous works on sampling theory on graphs. To handle 
full-band graph signals, we propose graph filter banks to 
force graph signals to be bandlimited. Finally, we apply the 
proposed sampling theory to semi-supervised classification of 
online blogs and digit images, where we achieve similar or 
better performance with fewer labeled samples compared to 
the previous works. 

Contributions. The main contributions of the paper are as 
follows: 


• A novel framework for sampling a graph signal, which 
solves complex sampling problems by using simple tools 
from linear algebra; 

• A novel approach for sampling a graph by preserving the 
first-order difference of the original graph signal; 

• A novel approach for designing a sampling operator on 
graphs. 


Outline of the paper. Section [TT| formulates the problem and 
briefly reviews DSPq, which lays the foundation for this paper; 
Section [HI] describes the proposed sampling theory for graph 
signals, and the proposed construction of graph structures for 
the sampled signal coefficients; Section [IV] studies the qualified 
sampling operator, including random sampling and experimen¬ 
tally designed sampling; Section [V] discusses the relations to 
previous works and extends the sampling framework to the 
design of graph filter banks; Section |Vl| s hows the application 
to semi-supervised learning; Section VH concludes the paper 
and provides pointers to future directions. 


II. Discrete Signal Processing on Graphs 

In this section, we briefly review relevant concepts of 
discrete signal processing on graphs; a thorough introduction 
can be found in a, l28l . It is a theoretical framework that 
generalizes classical discrete signal processing from regular 
domains, such as lines and rectangular lattices, to irregular 
structures that are commonly described by graphs. 


A. Graph Shift 

Discrete signal processing on graphs studies signals with 
complex, irregular structure represented by a graph G = 
(V, A), where V = • • •, ujv-i} is the set of nodes and 

A g C NxN is the graph shift , or a weighted adjacency matrix. 


It represents the connections of the graph G, which can be 
either directed or undirected (note that the standard graph 
Laplacian matrix can only represent undirected graphs 0). The 
edge weight A n?m between nodes v n and is a quantitative 
expression of the underlying relation between the nth and the 
rath node, such as similarity, dependency, or a communication 
pattern. To guarantee that the shift operator is properly scaled, 
we normalize the graph shift A to satisfy |A max (A)| = 1. 

B. Graph Signal 

Given the graph representation G = (V, A), a graph signal 
is defined as the map on the graph nodes that assigns the signal 
coefficient x n G C to the node v n . Once the node order is fixed, 
the graph signal can be written as a vector 

x = [i 0 ii ... x N -i] t eC N , (1) 

where the nth signal coefficient corresponds to node v n . 

C. Graph Fourier Transform 

In general, a Fourier transform corresponds to the expansion 
of a signal using basis elements that are invariant to filtering; 
here, this basis is the eigenbasis of the graph shift A (or, if 
the complete eigenbasis does not exist, the Jordan eigenbasis 
of A). For simplicity, assume that A has a complete eigenbasis 
and the spectral decomposition of A is ED 

A = V A V -1 , (2) 

where the eigenvectors of A form the columns of matrix V, 
and A G C NxN is the diagonal matrix of corresponding 
eigenvalues Ao, ..., Xn-i of A. These eigenvalues represent 
frequencies on the graph (28j. We do not specify the ordering 
of graph frequencies here and we will explain why later. 

Definition 1. The graph Fourier transform of x G c N is 

x = V -1 x. (3) 

The inverse graph Fourier transform is 

x = V x. 

The vector x in ([3]) represents the signal’s expansion in the 
eigenvector basis and describes the frequency content of the 
graph signal x. The inverse graph Fourier transform recon¬ 
structs the graph signal from its frequency content by combin¬ 
ing graph frequency components weighted by the coefficients 
of the signal’s graph Fourier transform. 

III. Sampling on Graphs 

Previous works on sampling theory of graph signals is based 
on spectral graph theory fl2l . The bandwidth of graph signals 
is defined based on the value of graph frequencies, which 
correspond to the eigenvalues of the graph Laplacian matrix. 
Since each graph has its own graph frequencies, it is hard in 
practice to specify a general cut-off graph frequency; it is also 
computationally inefficient to compute all the values of graph 
frequencies, especially for large graphs. 

In this section, we propose a novel sampling framework for 
graph signals. Here, the bandwidth definition is based on the 
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Symbol 

Description 

Dimension 

A 

graph shift 

N x N 

X 

graph signal 

N 

v - 1 

graph Fourier transform matrix 

N x N 

X 

graph signal in the frequency domain 

N 

V 

sampling operator 

M x N 

<f> 

interpolation operator 

N x M 

M 

sampled indices 


xm 

sampled signal coeffcients of x 

M 

X(K) 

first K coeffcients of x 

K 


first K columns of V 

N x K 


B. Sampling Theory for Graph Signals 

We now define a class of bandlimited graph signals, which 
makes perfect recovery possible. 

Definition 2. A graph signal is called bandlimited when there 
exists a AT G {0,1,••• , TV — 1} such that its graph Fourier 
transform x satisfies 

Xk = 0 for all k > K. 


TABLE I: Key notation used in the paper. 



Fig. 1: Sampling followed by interpolation. 

number of non-zero signal coefficients in the graph Fourier do¬ 
main. Since each signal coefficient in the graph Fourier domain 
corresponds to a graph frequency, the bandwidth definition is 
also based on the number of graph frequencies. This makes 
the proposed sampling framework strongly connected to linear 
algebra, that is, we are allowed to use simple tools from linear 
algebra to perform sampling on complex, irregular graphs. 


A. Sampling & Interpolation 

Suppose that we want to sample M coefficients of a graph 
signal x G C N to produce a sampled signal xm £ C M 
(M < TV), where M = (Ado,*" , Mm-i) denotes the 
sequence of sampled indices, and Mi £ {0,1, • • • , TV — 1}. 
We then interpolate xm t° get x ' £ C^, which recovers x 
either exactly or approximately. The sampling operator 4/ is a 
linear mapping from C N to C M , defined as 


f 1, j = Mn 

\ 0, otherwise, 


(4) 


and the interpolation operator <f> is a linear mapping from C M 
to C N (see Figure [lj, 


sampling : xm = £ C M , 

interpolation : x' = $xm = 4>4Tx £ C^, 

where x' e R N recovers x either exactly or approximately. 
We consider two sampling strategies: random sampling means 
that sample indices are chosen from {0, 1 , • • • , — 1} inde¬ 

pendently and randomly; and experimentally design sampling 
means that sample indices can be chosen beforehand. It is 
clear that random sampling is a subset of experimentally 
design sampling. 

Perfect recovery happens for all x when <l>4Ms the identity 
matrix. This is not possible in general because rank(T>T r ) < 
M < TV; it is, however, possible to do this for signals with 
specific structure that we will define as bandlimited graph 
signals, as in classical discrete signal processing. 


The smallest such K is called the bandwidth of x. A graph 
signal that is not bandlimited is called a full-band graph signal. 

Note that the bandlimited graph signals here do not neces¬ 
sarily mean low-pass, or smooth. Since we do not specify the 
ordering of frequencies, we can reorder the eigenvalues and 
permute the corresponding eigenvectors in the graph Fourier 
transform matrix to choose any band in the graph Fourier 
domain. The bandlimited graph signals are smooth only when 
we sort the eigenvalues in a descending order. The bandlimited 
restriction here is equivalent to limiting the number of non-zero 
signal coefficients in the graph Fourier domain with known 
supports. This generalization is potentially useful to represent 
non-smooth graph signals. 

Definition 3. The set of graph signals in with bandwidth 
of at most AT is a closed subspace denoted BLk(V -1 ), with 
V -1 as in 0. 

When defining the bandwidth, we focus on the number of 
graph frequencies, while previous works m focus on the 
value of graph frequencies. There are two shortcomings to 
using the values of graph frequencies: (a) When considering 
the values of graph frequencies, we ignore the discrete nature 
of graphs; because graph frequencies are discrete, two cut-off 
graph frequencies on the same graph can lead to the same 
bandlimited space. For example, assume a graph has graph 
frequencies 0, 0.1, 0.4, 0.6 and 2; when we set the cut-off 
frequency to either 0.2 or 0.3, they lead to the same bandlimited 
space; (b) The values of graph frequencies cannot be compared 
between different graphs. Since each graph has its own graph 
frequencies, a same value of the cut-off graph frequency on 
two graphs can mean different things. For example, one graph 
has graph frequencies as 0, 0.1, 0.2, 0.4 and 2, and another has 
graph frequencies 0, 1.1, 1.6, 1.8, and 2; when we set the cut-off 
frequency to 1, that is, we preserve all the graph frequencies 
that are no greater than 1, first graph preserves three out of 
four graph frequencies and the second graph only preserves 
one out of four. The values of graph frequencies thus do not 
necessarily give a direct and intuitive understanding about the 
bandlimited space. Another key advantage of using the number 
of graph frequencies is to build a strong connection to linear 
algebra allowing for the use of simple tools from linear algebra 
in sampling and interpolation of bandlimited graph signals. 

In Theorem 5.2 in ED, the authors show the recovery for 
vectors via projection, which lays the theoretical foundation 
for the classical sampling theory. Following the theorem, we 
obtain the following result, the proof of which can be found 
in El- 













IEEE TRANS. SIGNAL PROCESS. TO APPEAR. 


4 


Theorem 1. Let ^ satisfy 


and 


rank(\k V(^)) = K, 

where V(^) G R NxK denotes the first K columns of V. For 
all x G BL^(V 1 ), perfect recovery, x = is achieved 

by choosing 

$ = V W u, 


with U V(x) a K x K identity matrix. 

Theorem [I] is applicable for all graph signals that have a few 
non-zero elements in the graph Fourier domain with known 
supports, that is, K < N. 

Similarly to the classical sampling theory, the sampling rate 
has a lower bound for graph signals as well, that is, the sample 
size M should be no smaller than the bandwidth K. When 
M < K, rank(U^V(x)) < rank(U) < M < K, and thus, 
U T' V(k) can never be an identity matrix. For U V(k) to be 
an identity matrix, U is the inverse of ^ Vwhen M = K; 
it is a pseudo-inverse of V(^) when M > K, where the 
redundancy can be useful for reducing the influence of noise. 
For simplicity, we only consider M = K and U invertible. 
When M > K, we simply select K out of M sampled signal 
coefficients to ensure that the sample size and the bandwidth 
are the same. 

From Theorem [TJ we see that an arbitrary sampling operator 
may not lead to perfect recovery even for bandlimited graph 
signals. When the sampling operator satisfies the full-rank 
assumption <0, we call it a qualified sampling operator. To 
satisfy 0, the sampling operator should select at least one 
set of K linearly-independent rows in V^)- Since V is 
invertible, the column vectors in V are linearly independent and 
rank(V( k)) = K always holds; in other words, at least one set 
of K linearly-independent rows in V(k) always exists. Since 
the graph shift A is given, one can find such a set independently 
of the graph signal. Given such a set, Theorem [T] guarantees 
perfect recovery of bandlimited graph signals. To find linearly- 
independent rows in a matrix, fast algorithms exist, such as QR 
decomposition; see (36|, ED Since we only need to know the 
graph structure to design a qualified sampling operator , this 
follows the experimentally designed sampling. We will expand 


this topic in Section IV 


C. Sampled Graph Signal 

We just showed that perfect recovery is possible when the 
graph signal is bandlimited. We now show that the sampled sig¬ 
nal coefficients form a new graph signal, whose corresponding 
graph shift can be constructed from the original graph shift. 

Although the following results can be generalized to M > 
K easily, we only consider M = K for simplicity. Let the 
sampling operator ^ and the interpolation operator <f> satisfy 
the conditions in Theorem [I] For all x G BLk(V - 1 ), we have 

x = = <&xm == V(k)Uxa/( 

— V(K)X(K), 

where X(k) denotes the first K coefficients of x, (a) follows 
from Theorem [T] and (b) from Definition [2] We thus get 

X(K) U Xm. 5 


XM = u 1 XJx M =XJ 1 X( K y 

From what we have seen, the sampled signal coefficients xm 
and the frequency content X( K ) form a Fourier pair because 
xm can be constructed from X( K ) through U -1 and x^ K ) can 
also be constructed from xm through U. This implies that, ac¬ 
cording to Definition [I] and the spectral decomposition ([2]), xm 
is a graph signal associated with the graph Fourier transform 
matrix U and a new graph shift 

A jM = r 1 v ) uec M , 

where A ^ K ) e C KxK is a diagonal matrix that samples the 
first K eigenvalues of A. This leads to the following theorem. 

Theorem 2. Let x G BLx(V -1 ) and let 

xm = g C K 

be its sampled version, where ^ is a qualified sampling 
operator. Then, the graph shift associated with the graph signal 

xm is 

A^ = r 1 v ) uec M , (5) 

with U = (^V(jf)) _1 . 

From Theorem [2j we see that the graph shift A vi is 
constructed by sampling the rows of the eigenvector matrix and 
sampling the first K eigenvalues of the original graph shift A. 
We simply say that A m is sampled from A, preserving certain 
information in the graph Fourier domain. 

Since the bandwidth of x is K, the first K coefficients in 
the frequency domain are X( K ) = xm, and the other N — 
K coefficients are £(_x) = 0; i n other words, the frequency 
contents of the original graph signal x and the sampled graph 
signal xm are equivalent after performing their corresponding 
graph Fourier transforms. 

Similarly to Theorem [I] by reordering the eigenvalues and 
permuting the corresponding eigenvectors in the graph Fourier 
transform matrix, Theorem [2] is applicable to all graph signals 
that have limited support in the graph Fourier domain. 

D. Property of A Sampled Graph Signal 

We argued that A m = U -1 A ^ K ) U is the graph shift that 
supports the sampled signal coefficients xm following from a 
mathematical equivalence between the graph Fourier transform 
for the sampled graph signal and the graph shift. We, in fact, 
implicitly proposed an approach to sampling graphs. Since 
sampled graphs always lose information, we now study which 
information A m preserves. 

Theorem 3. For all x G BLx(V -1 ), 

xm ~ Am xm = ^ (x — A x ). 

Proof 

im-AmIm = U~ 1 x M -U _ 1 A( K )UU _ 1 a:.M 

= l I / V(x)(I-A(^))a;(^) 

= ^ (x — Ax ), 

where the last equality follows from x G BIMV- 1 ). □ 
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Fig. 2: Sampling followed by interpolation. The arrows indicate that the edges are directed. 


The term x — Ax measures the difference between the 
original graph signal and its shifted version. This is also called 
the first-order difference of x, while the term 4/ (x — A x) 
measures the first-order difference of x at sampled indices. 
Furthermore, \\x — Ax\\^ is the graph total variation based on 
the £ p -norm , which is a quantitative characteristic that measures 
the smoothness of a graph signal l28l . When using a sampled 
graph to represent the sampled signal coefficients, we lose the 
connectivity information between the sampled nodes and all 
the other nodes; despite this, A m still preserves the first-order 
difference at sampled indices. Instead of focusing on preserving 
connectivity properties as in prior work ED, we emphasize the 
interplay between signals and structures. 


and the fourth coefficients. Then, M = (1,2,4), xm — 
[0.29 0.32 0.05] T , and the sampling operator 


4/ = 


1 

0 

0 


0 0 
1 0 
0 0 


0 0 
0 0 
1 0 


is qualified. We recover x by using the following interpolation 
operator (see Figure [2]) 


^ = V (3) (4/V (3 )) —1 = 


1 0 0 

0 1 0 

-2.7 2.87 0.83 

0 0 1 

5.04 -3.98 -0.05 


E. Example 

We consider a five-node directed graph with graph shift 


A = 


The corresponding inverse graph Fourier transform matrix is 



" 0.45 

0.19 

0.25 

0.35 

-0.40 " 


0.45 

0.40 

0.16 

-0.74 

0.18 

V = 

0.45 

0.08 

-0.56 

0.29 

0.36 


0.45 

-0.66 

-0.41 

-0.47 

-0.57 


0.45 

-0.60 

0.66 

0.13 

0.59 _ 

and the frequencies 

are 




A = 

: diag [l 

0.39 

-0.12 

-0.44 

-0.83] . 


Let K = 3; generate a bandlimited graph signal x G BL 3 (V x ) 
as 

x= [0.5 0.2 0.1 0 0] T , 


with 

x = [0.29 0.32 0.18 0.05 0.17] T , 


The inverse graph Fourier transform matrix for the sampled 
signal is 


ir 1 = 


(3) 


and the sampled frequencies are 
1 

A (3) = 


Ax = U -1 A(3) U : 


The first-order difference of xm is 


0.45 

0.19 

0.25 

0.45 

0.40 

0.16 

0.45 - 

-0.66 - 

-0.41 

s are 

0 

0 


0.39 

0 


0 

-0.12 


then constructed 

as 

0.07 

0.75 

0.32 

-0.23 

0.96 

0.28 

1.17 

-0.56 

0.39 

: xm is 

0.07 - 

0.13] T 

= W(x 


We see that the sampled graph shift contains self-loops and 
negative weights and seems to be dissimilar to A, but Am 
preserves a part of the frequency content of A because U -1 
is sampled from V and A( 3 ) is sampled from A. A m also 
preserves the first-order difference of x, which validates The¬ 
orem [3] 


and the first-order difference of x is 

x — Ax = [0.05 0.07 -0.05 -0.13 0.0002] T . 

We can check the first three columns of V to see that all 
sets of three rows are independent. According to the sampling 
theorem, we can then recover x perfectly by sampling any 
three of its coefficients; for example, sample the first, second 


IV. Sampling with A Qualified Sampling Operator 

As shown in Section |In| only a qualified sampling oper¬ 
ator 0 can lead to perfect recovery for bandlimited graph 
signals. Since a qualified sampling operator 0 is designed 
via the graph structure, it belongs to experimentally designed 
sampling. The design consist in finding K linearly independent 
rows in V(^), which gives multiple choices. In this section, we 
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Fig. 3: Sampling a graph. 

propose an optimal approach to designing a qualified sampling 
operators by minimizing the effect of noise for general graphs. 
We then show that for some specific graphs, random sampling 
also leads to perfect recovery with high probability. 


A. Experimentally Designed Sampling 

We now show how to design a qualified sampling operator 
on any given graph that is robust to noise. We then compare this 
optimal sampling operator with a random sampling operator on 
a sensor network. 

1) Optimal Sampling Operator: As mentioned in Sec¬ 
tion [hTb at least one set of K linearly-independent rows 
in V(k) always exists. When we have multiple choices of K 
linearly-independent rows, we aim to find the optimal one to 
minimize the effect of noise. 

We consider a model where noise e is introduced during 
sampling as follows, 




= ■ 


where is a qualified sampling operator. The recovered graph 
signal, x' e , is then 

x' e = &xju = Q^x + <he = x + <f>e. 

To bound the effect of noise, we have 

\W~x \\ 2 = IM2 = ||V W Ue|| 2 
< ||V W || 2 IIIU|| 2 .Ile|| 2 , 

where the inequality follows from the definition of the spectral 
norm. Since ||V(k) || 2 and ||e|| 2 are fixed, we want U to have a 
small spectral norm. From this perspective, for each feasible T', 
we compute the inverse or pseudo-inverse of V(^) to obtain 
U ; the best choice comes from the U with the smallest spectral 
norm. This is equivalent to maximizing the smallest singular 
value of (#)» 

y°pi _ argmax<7 min ('I' V(k)), (6) 

where cr m i n denotes the smallest singular value. The solution 
of ([6]) is optimal in terms of minimizing the effect of noise; 
we simply call it optimal sampling operator. Since we restrict 
the form of in ([?]), ([6]) is non-determini Stic polynomial-time 
hard. To solve ([6]), we can use a greedy algorithm as shown in 
Algorithm [T] In a previous work, the authors solved a similar 
optimization problem for matrix approximation and showed 
that the greedy algorithm gives a good approximation to the 
global optimum |38). Note that M is the sampling sequence, 
indicating which rows to select, and (V(k))m denotes the 
sampled rows from V^)- When increasing the number of 


samples, the smallest singular value of V(^) grows, and thus, 
redundant samples make the algorithm robust to noise. 


Algorithm 1 Optimal Sampling Operator via Greedy Algo¬ 
rithm 

Input V(if) the first K columns of V 
M the number of samples 
Output At sampling set 

Function 

while \A4\ < M 

m = argmax* (j min ((V(k)).m+{z}) 

Ai «— Ai + {m} 
end 

return Ai 


2) Simulations: We consider 150 weather stations in the 
United States that record local temperatures 0. The graph 
representing these weather stations is obtained by assigning an 
edge when the geodesic distance between each pair of weather 
stations is smaller than 500 miles, that is, the graph shift A is 
formed as 





when 0 < dij < 500; 
otherwise, 


where dij is the geodesic distance between the ith and the j th 
weather stations. 

We simulate a graph signal with bandwidth of 3 as 


x = vi + 0.5v 2 + 2v 3 , 


where v* is the ith column in V. We design two sampling 
operators to recover x: an arbitrary qualified sampling operator 
and the optimal sampling operator. We know that both of them 
recover x perfectly given 3 samples. To validate the robustness 
to noise, we add Gaussian noise with mean zero and variance 
0.01 to each sample. We recover the graph signal from its 
samples by using the interpolation operator ([5]). 

Figure [4] (f) and (h) show the recovered graph signal from 
each of these two sampling operators. We see that an arbitrary 
qualified sampling operator is not robust to noise and fails to 
recover the original graph signal, while the optimal sampling 
operator is robust to noise and approximately recovers the 
original graph signal. 


B. Random Sampling 

Previously, we showed that we need to design a qualified 
sampling operator to achieve perfect recovery. We now show 
that, for some specific graphs, random sampling leads to perfect 
recovery with high probabilities. 

1) Frames with Maximal Robustness to Erasures: A frame 
{/o, /2, • • • , fN—i} is a generating system for C K with N > 
K, when there exist two constants a > 0 and b < oo, such that 
for all x G C N , 

a \\ x \\ 2 < J 2 \ fk x \ 2 ^ b \\ x \\ 2 ■ 

k 

In finite dimensions, we represent the frame as an N x K 
matrix with rows /J. The frame is maximally robust to erasures 
when every K x K submatrix (obtained by deleting N — K 
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(d) Original graph signal. 


(e) Samples from the optimal 
sampling operator. 


(f) Recovery from the optimal 
sampling operator. 



(g) Samples from a qualified 
sampling operator. 


(h) Recovery from a qualified 
sampling operator. 


Fig. 4: Graph signals on the sensor network. Colors indicate values of signal coefficients. Red represents high values, blue 
represents low values. Big nodes in (e) and (g) represent the sampled nodes in each case. 


rows) is invertible (39). In (39) . the authors show that a poly¬ 
nomial transform matrix is one example of a frame maximally 
robust to erasures; in (40), the authors show that many lapped 
orthogonal transforms and lapped tight frame transforms are 
also maximally robust to erasures. It is clear that if the inverse 
graph Fourier transform matrix V as in © is maximally 
robust to erasures, any sampling operator that samples at least 
K signal coefficients guarantees perfect recovery; in other 
words, when a graph Fourier transform matrix happens to 
be a polynomial transform matrix, sampling any K signal 
coefficients leads to perfect recovery. 


For example, a circulant graph is a graph whose adjacency 
matrix is circulant ED. The circulant graph shift, C, can be 
represented as a polynomial of the cyclic permutation matrix, 
A. The graph Fourier transform of the cyclic permutation 
matrix is the discrete Fourier transform, which is again a 


polynomial transform matrix. As described above, we have 

L — l L—l 

C = ^2 hi A i = 

z=0 i =0 

■ F *(S ft ‘ A ') p ' 

where L is the order of the polynomial, and hi is the coeffi¬ 
cient corresponding to the ith order. Since the graph Fourier 
transform matrix of a circulant graph is the discrete Fourier 
transform matrix, we can perfectly recover a circulant-graph 
signal with bandwidth K by sampling any M > K signal 
coefficients as shown in Theorem [T] In other words, perfect 
recovery is guaranteed when we randomly sample a sufficient 
number of signal coefficients. 

2) Erdos-Renyi Graph: An Erdos-Renyi graph is con¬ 
structed by connecting nodes randomly, where each edge is 
included in the graph with probability p independent of any 
other edge (T|, 0. We aim to show that by sampling K signal 
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cofficients randomly, the singular values of the corresponding 
V(x) are bounded. 

Lemma 1. Let a graph shift A E R iVxAr represent an Erdos- 
Renyi graph, where each pair of vertices is connected randomly 
and independently with probability p = g(N ) log (TV)/TV, and 
g(-) is some positive function. Let V be the eigenvector matrix 
of A with V T V = TV I, and let the number of sampled 
coefficients satisfy 

m > max(Cl i„g K , c 2 log |), 

p 0 

for some positive constants C\ , C 2 ■ Then, 


P 


T(*v ( jo) t (*v ( jo)-i 



< 1-6 


(7) 


is satisfied for all sampling operators that sample M signal 
coefficients. 


Proof. For an Erdos-Renyi graph, the eigenvector matrix V 
satisfies 

max | V id | = O ^/log 2 ' 2 5 (iV)log N/fj 

for p = g(N) log(TV)/TV ll42l . By substituting V into Theorem 
1.2 in [ |43lL we obtain 0 . □ 

Theorem 4. Let A, V, 4T be defined as in Lemma [T] With 
probability (1 — 5), V(x) is a frame in C K with lower bound 
M/2 and upper bound 3M/2. 

Proof. Using Lemma [I] with probability (1 — 5), we have 

}|&(*v w ) r (*v w )-i || 2 <\. 

We thus obtain for all x E C K , 

~\x T x< x T (^(^V w ) T (^V (if) ) -l)x <^x T x, 

YX T x< x T ('S?V {K) ) T ('i>'V {K) )x <^fx T x. 

□ 

From Theorem!?] we see that the singular values of V(k) 
are bounded with nigh probability. This shows that V(#) 
has full rank with high probability; in other words, with high 
probability, perfect recovery is achieved for Erdos-Renyi graph 
signals when we randomly sample a sufficient number of signal 
coefficients. 

3) Simulations: We verify Theorem [4] by checking the 
probability of satisfying the full-rank assumption by random 
sampling on Erdos-Renyi graphs. Once the full-rank assump¬ 
tion is satisfied, we can find a qualified sampling operator 
to achieve perfect recovery, thus, we call this probability the 
success rate of perfect recovery. 

We check the success rate of perfect recovery with various 
sizes and connection probabilities. We vary the size of an 
Erdos-Renyi graph from 50 to 500 and the connection probabil¬ 
ities with an interval of 0.01 from 0 to 0.5. For each given size 
and connection probability, we generate 100 graphs randomly. 
Suppose that for each graph, the corresponding graph signal is 
of fixed bandwidth K = 10. Given a graph shift, we randomly 


sample 10 rows from the first 10 columns of the graph Fourier 
transform matrix and check whether the 10 x 10 matrix is of 
full rank. Based on Theorem [I] if the 10 x 10 matrix is of full 
rank, the perfect recovery is guaranteed. For each given graph 
shift, we run the random sampling for 100 graphs, and count 
the number of successes to obtain the success rate. 

Figure [5] shows success rates for sizes 50 and 500 averaged 
over 100 random tests. When we fix the graph size, in Erdos- 
Renyi graphs, the success rate increases as the connection 
probability increases, that is, more connections lead to higher 
probability of getting a qualified sampling operator. When we 
compare the different sizes of the same type of graph, the 
success rate increases as the size increases, that is, larger graph 
sizes lead to higher probabilities of getting a qualified sampling 
operator. Overall, with a sufficient number of connections, the 
success rates are close to 100%. The simulation results suggest 
that the full-rank assumption is easier to satisfy when there 
exist more connections on graphs. The intuition is that in a 
larger graph with a higher connection probability, the difference 
between nodes is smaller, that is, each node has a similar 
connectivity property and there is no preference to sampling 
one rather than the other. 



Fig. 5: Success rates for Erdos-Renyi graphs. The blue curve 
represents an Erdos-Renyi graph of size 50 and the red curve 
represents an Erdos-Renyi graph of size 500. 


V. Relations & Extensions 

We now discuss three topics: relation to the sampling theory 
for finite discrete-time signals, relation to compressed sensing, 
and how to handle a full-band graph signal. 


A. Relation to Sampling Theory for Finite Discrete-Time Sig¬ 
nals 

We call the graph that supports a finite discrete-time signal 
a finite discrete-time graph , which specifies the time-ordering 
from the past to future. The finite discrete-time graph can be 
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represented by the cyclic permutation matrix eh, m, 


A 


'0 0 • • • 1 

1 0 ••• 0 

: 0 
0 ••• 10 


VAV" 1 , (8) 


where the eigenvector matrix 
V = [v 0 Vi • 


vjv-i = 


vw(w jk r 


j,k=0,---N-l 


(9) 

is the Hermitian transpose of the 7V-point discrete Fourier 
transform matrix, V = F*, V -1 is the TV-point discrete Fourier 
transform matrix, V -1 = F, and the eigenvalue matrix is 


A = diag [W° W 1 • • • VF 7 ^ 1 ] , (10) 

where W = e ~ 27T j/ N t We see that Definitions § i and 
Theorem [T] are immediately applicable to finite discrete-time 
signals and are consistent with sampling of such signals ED. 


Definition 4. A discrete-time signal is called bandlimited when 
there exists K G {0,1, • • • ,1V—1} such that its discrete Fourier 
transform x satisfies 


Xk = 0 for all k > K. 

The smallest such K is called the bandwidth of x. A discrete¬ 
time signal that is not bandlimited is called a full-band discrete¬ 
time signal. 

Definition 5. The set of discrete-time signals in with 
bandwidth of at most AT is a closed subspace denoted BL^(F), 
with F the discrete Fourier transform matrix. 

With this definition of the discrete Fourier transform matrix, 
the highest frequency is in the middle of the spectrum (although 
this is just a matter of ordering). From Definitions [4] and [5] 
we can permute the rows in the discrete Fourier transform 
matrix to choose any frequency band. Since the discrete Fourier 
transform matrix is a Vandermonde matrix, any K rows of F* X ) 
are independent [36], ED; in other words, rank(\h F* K )) = K 
always holds when M > K. We apply now Theorem [T] to 
obtain the following result. 

Corollary 1. Let 4/ satisfy that the sampling number is no 
smaller than the bandwidth, M > K. For all x G BL^(F), 
perfect recovery, x = is achieved by choosing 

* = F^)U, 

with U 4/ F* K ) a K x K identity matrix, and F * K ^ denotes the 
first K columns of F*. 

From Corollary [T] we can perfectly recover a discrete-time 
signal when it is bandlimited. 

Similarly to Theorem [2j we can show that a new graph shift 
can be constructed from the finite discrete-time graph. Multiple 
sampling mechanisms can be used to sample a new graph shift; 
an intuitive one is as follows: let x G be a finite discrete¬ 
time signal, where N is even. Reorder the frequencies in ( fTO] ), 
by putting the frequencies with even indices first, 

A = diag [Ao A 2 • • • A n -2 Ai A 3 • • • Ajv-i] • 


Similarly, reorder the columns of V in ([9| by putting the 
columns with even indices first 


V = [v 0 V 2 • • • VJV-2 Vi V 3 • • • V]V_ 1 ] . 

One can check that VAV is still the same cyclic permutation 
matrix. Suppose we want to preserve the first N/2 frequency 
components in A; the sampled frequencies are then 

A(jv/ 2 ) = diag [Ao A2 • • • A7V-2] • 

Let a sampling operator choose the first N/2 rows in V(jv/2)» 


W 


(N/2) 


vw(w 2jk r 


j,k=Q,---N/2-l 


which is the Hermitian transpose of the discrete Fourier 
transform of size N/2 and satisfies rank(\hV( 7 v/ 2 )) = AT/2 
in Theorem [2] The sampled graph Fourier transform matrix 
U = (\FV (at/2) )) —1 is the discrete Fourier transform of size 
N/2. The sampled graph shift is then constructed as 


A M = U 1 A(tv/2) U, 


which is exactly the N/2 x N/2 cyclic permutation ma¬ 
trix. Hence, we have shown that by choosing an appropriate 
sampling mechanism, a smaller finite discrete-time graph is 
obtained from a larger finite discrete-time graph by using 
Theorem [2] We note that using a different ordering or sampling 
operator would result in a graph shift that can be different 
and non-intuitive. This is simply a matter of choosing different 
frequency components. 


B. Relation to Compressed Sensing 

Compressed sensing is a sampling framework to recover 
sparse signals with a few measurements RBI . The theory asserts 
that a few samples guarantee the recovery of the original signals 
when the signals and the sampling approaches are well-defined 
in some theoretical aspects. To be more specific, given the 
sampling operator G R MxN ,M N, and the sampled 
signal xm = 4/x, a sparse signal x G R N is recovered by 
solving 

min ||x|| n , subject to xm = (11) 

X 

Since the Iq norm is not convex, the optimization is a non- 
deterministic polynomial-time hard problem. To obtain a com¬ 
putationally efficient algorithm, the /i-norm based algorithm, 
known as basis pursuit or basis pursuit with denoising, recovers 
the sparse signal with small approximation error m. 

In the standard compressed sensing theory, signals have to be 
sparse or approximately sparse to gurantee accurate recovery 
properties. In ESI, the authors proposed a general way to 
perform compressed sensing with non-sparse signals using 
dictionaries. Specifically, a general signal x G R N is recovered 
by 

min ||Dx| n , subject to xm = (12) 

X 

where D is a dictionary designed to make Dx sparse. When 
specifying x to be a graph signal, and D to be the appropriate 
graph Fourier transform of the graph on which the signal 
resides, Dx represents the frequency content of x, which is 
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sparse when x is of limited bandwidth. Equation (12] ) recovers a 
bandlimited graph signal from a few sampled signal coefficients 
via an optimization approach. The proposed sampling theory 
deals with the cases where the frequencies corresponding to 
non-zero elements are known, and can be reordered to form 
a bandlimited graph signal. Compressed sensing deals with 
the cases where the frequencies corresponding to non-zero 
elements are unknown, which is a more general and harder 
problem. If we have access to the position of the non-zero 
elements, the proposed sampling theory uses the smallest 
number of samples to achieve perfect recovery. 


C. Relation to Signal Recovery on Graphs 

Signal recovery on graphs attempts to recover graph signals 
that are assumed to be smooth with respect to underlying 
graphs, from noisy, missing, or corrupted samples (23). Pre¬ 
vious works studied signal recovery on graphs from different 
perspectives. For example, in (47), the authors considered that 
a graph is a discrete representation of a manifold, and aimed 
at recovering graph signals by regularizing the smoothness 
functional; in [48], the authors aimed at recovering graph 
signals by regularizing the combinatorial Dirichlet; in (23) . the 
authors aimed at recovering graph signals by regularizing the 
graph total variation; in (H, the authors aimed at recovering 
graph signals by finding the empirical modes; in CS), the 
authors aimed at recovering graph signals by training a graph- 
based dictionary. These works focused on minimizing the 
empirical recovery error, and dealt with general graph signal 
models. It is thus hard to show when and how the missing 
signal coefficients can be exactly recovered. 

Similarly to signal recovery on graphs, sampling theory on 
graphs also attempts to recover graph signals from incomplete 
samples. The main difference is that sampling theory on graphs 
focuses on a subset of smooth graph signals, that is, bandlim¬ 
ited graph signals, and theoretically analyzes when and how 
the missing signal coefficients can be exactly recovered. The 
authors in CEB, Eo), 02) considered a similar problem to ours, 
that is, recovery of bandlimited graph signals by sampling a few 
signal coefficients in the vertex domain. The main differences 
are as follows: (1) we focus on the graph adjacency matrix, not 
the graph Laplacian matrix; (2) when defining the bandwidth, 
we focus on the number of frequencies, not the values of 
frequencies. Some recent extensions of sampling theory on 
graphs include [51], (52) . [53]. 


D. Graph Downsampling & Graph Filter Banks 

In classical signal processing, sampling refers to sampling 
a continuous function and downsampling refers to sampling a 
sequence. Both concepts use fewer samples to represent the 
overall shape of the original signal. Since a graph signal is 
discrete in nature, sampling and downsampling are the same. 
Previous works implemented graph downsampling via graph 
coloring (6) or minimum spanning tree (54) . 

The proposed sampling theory provides a family of qualified 
sampling operators 0 with an optimal sampling operator as 
in ([6]). To downsample a graph by 2, one can set the bandwidth 
to a half of the number of nodes, that is, K = N/ 2, and use 0 


to obtain an optimal sampling operator. An example for the 
finite discrete-time signals was shown in Section |V-A| 

As shown in Theorem [l] perfect recovery is achieved when 
graph signals are bandlimited. To handle full-band graph sig¬ 
nals, we propose an approach based on graph filter banks, 
where each channel does not need to recover perfectly but in 
conjunction they do. 

Let x be a full-band graph signal, which, without loss of 
generality, we can express without loss of generality as the 
addition of two bandlimited signals supported on the same 
graph, that is, x = x l + x h , where 


x l =F [ x , x h = F h x, 


and 


F L = V 


I k O' 
0 0 


V“ 


F n = = V 


0 0 

0 In-k 


V- 1 . 


We see that x L contains the first K frequencies, x h contains 
the other N — K frequencies, and each is bandlimited. We 
do sampling and interpolation for x l and x h in two channels, 
respectively. Take the first channel as an example. Following 
Theorems |T] and [ 2 J we use a qualified sampling operator 
to sample x\ and obtain the sampled signal coefficients as 
x l Ml = ^ l x l , with the corresponding graph as A M i. We can 
recover x l by using interpolation operator & as x l = <$> l x l Ml . 
Finally, we add the results from both channels to obtain the 
original full-band graph signal (also illustrated in Figure [6]). 
The main benefit of working with a graph filter bank is that, 
instead of dealing with a long graph signal with a large graph, 
we are allowed to focus on the frequency bands of interests 
and deal with a shorter graph signal with a small graph in 
each channel. 

We do not restrict the samples from two bands, x l Ml and 
x^n the same size because we can adaptively design the 
sampling and interpolation operators based on the their own 
sizes. This is similar to the filter banks in the classical literature 
where the spectrum is not evenly partitioned between the chan¬ 
nels (551. We see that the above idea can easily be generalized 
to multiple channels by splitting the original graphs signal into 
multiple bandlimited graph signals; instead of dealing with a 
huge graph, we work with multiple small graphs, which makes 
computation easier. 

Simulations. We now show an example where we ana¬ 
lyze graph signals by using the proposed graph filter banks. 
Similarly to Section |IV- A2 we consider that the weather 
stations across the U.S. form a graph and temperature values 
measured at each weather station in one day form a graph 
signal. Suppose that a high-frequency component represents 
some pattern of weather change; we want to detect this pattern 
given the temperature values. We can decompose a graph 
signal of temperature values into a low-frequency channel 
(largest 15 frequencies) and a high-frequency channel (smallest 
5 frequencies). In each channel, we sample the bandlimited 
graph signal to obtain a sparse and loseless representation. 
Figure [7] shows a comparison between temperature values on 
January 1st, 2013 and May 1st, 2013. We intentionally added 
some high-frequency component to the temperature values on 
January 1st, 2013. We see that the high-frequency channel 
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X 



Fig. 6: Graph filter bank that splits the graph signal into two bandlimited graph signals. In each channel, we perform sampling 
and interpolation, following Theorem [I] Finally, we add the results from both channels to obtain the original full-band graph 
signal. 


in Figure [7] (a) detects the change, while the high-frequency 
channel in Figure [7] (b) does not. 

Directly using the graph frequency components can also 
detect the high-frequency components, but the graph frequency 
components cannot be easily visualized. Since the graph 
structure and the decomposed channels are fixed, the optimal 
sampling operator and corresponding interpolation operator in 
each channel can be designed in advance, which means that 
we just need to look at the sampled coefficients of a fixed set 
of nodes to check whether a channel is activated. The graph 
filter bank is thus fast and visualization friendly. 

VI. Applications 

The proposed sampling theory on graphs can be applied to 
semi-supervised learning, whose goal is to classify data with 
a few labeled and a large number of unlabeled samples lf56l . 
One approach is based on graphs, where the labels are often 
assumed to be smooth on graphs. From a perspective of signal 
processing, smoothness can be expressed as lowpass nature 
of the signal. Recovering smooth labels on graphs is then 
equivalent to interpolating a low-pass graph signal. We now 
look at two examples, including classification of online blogs 
and handwritten digits. 

1) Sampling Online Blogs: We first aim to investigate the 
success rate of perfect recovery by using random sampling, 
and then classify the labels of the online blogs. We consider 
a dataset of N = 1224 online political blogs as either 
conservative or liberal ED- We represent conservative labels 
as +1 and liberal ones as —1. The blogs are represented by a 
graph in which nodes represent blogs, and directed graph edges 
correspond to hyperlink references between blogs. The graph 
signal here is the label assigned to the blogs, called the labeling 
signal. We use the spectral decomposition in 0 for this 
online-blog graph to get the graph frequencies in a descending 
order and the corresponding graph Fourier transform matrix. 
The labeling signal is a full-band signal, but approximately 
bandlimited. 

To investigate the success rate of perfect recovery by using 
random sampling, we vary the bandwidth K of the labeling 
signal with an interval of 1 from 1 to 20, randomly sample K 
rows from the first K columns of the graph Fourier transform 
matrix, and check whether the K x K matrix has full rank. For 
each bandwidth, we randomly sample 10,000 times, and count 
the number of successes to obtain the success rate. Figure [5] 


(a) shows the resulting success rate. We see that the success 
rate decreases as we increase the bandwidth; it is above 90%, 
when the bandwidth is no greater than 20. It means that we 
can achieve perfect recovery by using random sampling with a 
fairly high probability. As the bandwidth increases, even if we 
get an equal number of samples, the success rate still decreases, 
because when we take more samples, it is easier to get a sample 
correlated with the previous samples. 

Since a qualified sampling operator is independent of graph 
signals, we precompute the qualified sampling operator for 
the online-blog graph, as discussed in Section |III-B| When 


the labeling signal is bandlimited, we can sample M labels 
from it by using a qualified sampling operator, and recover 
the labeling signal by using the corresponding interpolation 
operator. In other words, we can design a set of blogs to 
label before querying any label. Most of the time, however, 
the labeling signal is not bandlimited, and it is not possible 
to achieve perfect recovery. Since we only care about the 
sign of the labels, we use only the low frequency content to 
approximate the labeling signal; after that, we set a threshold 
to assign labels. To minimize the influence from the high- 
frequency content, we can use the optimal sampling operator 
in Algorithm [I] 

We solve the following optimization problem to recover the 
low frequency content, 


^opt 

X(K) = ar S ; 


mm 


|sgn(\l/ V (X) X(K)) — X M ||2 > ( 13 > 


where 4/ e R MxN is a sampling operator, xm £ is a 
vector of the sampled labels whose element is either +1 or 
— 1, and sgn(-) sets all positive values to +1 and all negative 
values to —1. Note that without sgn(*), the solution of 
is ('k V(iT)) 1 %M i n Theorem [lj which perfectly recovers the 
labeling signal when it is bandlimited. When the labeling signal 
is not bandlimited, the solution of ( p"3] ) approximates the low- 
frequency content. The £2 norm ( [13] ) can be relaxed by the logit 
function and solved by logistic regression [[58]. The recovered 
labels are then x opt = sgn(V(#) ). 

Figure [8] (b) compares the classification accuracies between 
optimal sampling and random sampling by varying the sample 
size with an interval of 1 from 1 to 20. We see that the 
optimal sampling significantly outperforms random sampling, 
and random sampling does not improve with more samples, 
because the interpolation operator assumes that the sam¬ 
pling operator is qualified, which is not always true for random 
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(a) Temperature on January 1st, 2013 with high-frequency pattern. 





(b) Temperature on May 1st, 2013. 




Fig. 7: Graph filter banks analysis. 



(a) Success rate as a function of bandwidth. (b) Classification accuracy as a function of the number of samples. 


Fig. 8: Classification for online blogs. When increasing the bandwidth, it is harder to find a qualified sampling operator. The 
experimentally designed sampling with the optimal sampling operator outperforms random sampling. 


sampling as shown in Figure [8] (a). Note that classification 
accuracy for the optimal sampling is as high as 94.44% by only 
sampling two blogs, and the classification accuracy gets slightly 
better as we increases the number of samples. Compared with 
the previous results (241, to achieve around 94% classification 
accuracy, 

• harmonic functions on graph samples 120 blogs; 

• graph Laplacian regularization samples 120 blogs; 

• graph total variation regularization samples 10 blogs; and 

• the proposed optimal sampling operator <01 samples 2 
blogs. 

The improvement comes from the fact that, instead of 
sampling randomly as in m, we use the optimal sampling 
operator to choose samples based on the graph structure. 

2 ) Classification for Handwritten Digits: We aim to use the 
proposed sampling theory to classify handwritten digits and 
achieve high classification accuracy with fewer samples. 

We work with two handwritten digit datasets, the MNIST 
(59) and the USPS (60). Each dataset includes ten classes 
(0-9 digit characters). The MNIST dataset includes 60,000 
samples in total. We randomly select 1000 samples for each 
digit character, for a total of N = 10, 000 digit images; each 
image is normalized to 28 x 28 = 784 pixels. The USPS dataset 


includes 11,000 samples in total. We use all the images in the 
dataset; each image is normalized to 16 x 16 = 256 pixels. 

Since same digits produce similar images, it is intuitive to 
build a graph to reflect the relational dependencies among 
images. For each dataset, we construct a 12-nearest neighbor 
graph to represent the digit images. The nodes represent digit 
images and each node is connected to 12 other nodes that rep¬ 
resent the most similar digit images; the similarity is measured 
by the Euclidean distance. The graph shift is constructed as 
Aj with 


P i,j = exp 


' —N 2 \\fi — /. 


3 II2 


Eij II fi ~ f: 


3 II2 


with fi a vector representing the digit image. The graph shift 
is asymmetric, representing a directed graph, which cannot be 
handled by graph Laplacian-based methods. 

Similarly to Section VI- 1| we aim to label all the digit 
images by actively querying the labels of a few images. To 
handle 10-class classification, we form a ground-truth matrix 
X of size iV x 10. The element X^- is +1, indicating the 
membership of the it h image in the jth digit class, and is 
— 1 otherwise. We obtain the optimal sampling operator 4/ 
as shown in Algorithm [I] The querying samples are then 
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(a) MNIST. (b) USPS. 

Fig. 9: Graph representations of the MNIST and USPS datasets. For both datasets, the nodes (digit images) with the same 
digit characters are shown in the same color and the big black dots indicate 10 sampled nodes by using the optimal sampling 
operators in Algorithm [T] 


^ X G M Mx1 °. We recover the low frequency content 
as 


~° P t 

\k) 


arg min 

x (K) eR Kxl ° 


sgn(^V (K) X (x) ) -X M 


2 

2 ’ 


(14) 


We solve © approximately by using logistic regression and 
then obtain the estimated label matrix X opt = V(k)X^) G 
M^* 10 , whose element (X° pt )ij shows a confidence of label¬ 
ing the ith image as the jth digit. We finally label each digit 
image by choosing the one with largest value in each row of 

opt 


The graph representations of the MNIST and USPS datasets, 
and the optimal sampling sets are shown in Figure [9] The 
coordinates of nodes come from the corresponding rows of the 
first three columns of the inverse graph Fourier transform. We 
see that the images with the same digit characters form clus¬ 
ters, and the optimal sampling operator chooses representative 
samples from different clusters. 

Figure [TO] shows the classification accuracy by varying the 
sample size with an interval of 10 from 10 to 100 for both 
datasets. For the MNIST dataset, we query 0.1% — 1% images; 
for the USPS dataset, we query 0.09% — 0.9% images. We 
achieve around 90% classification accuracy by querying only 
0.5% images for both datasets. Compared with the previous 
results (35), in the USPS dataset, given 100 samples, 


• local linear reconstruction is around 65%; 

• normalized cut based active learning is around 70%; 

• graph sampling based active semi-supervised learning is 
around 85%; and 

• the proposed optimal sampling operator © with the 
interpolation operator (f4| ) achieves 91.69%. 




Fig. 10: Classification accuracy of the MNIST and USPS 
datasets as a function of the number of querying samples. 


VII. Conclusions 

We proposed a novel sampling framework for graph signals 
that follows the same paradigm as classical sampling theory 
and strongly connects to linear algebra. We showed that perfect 
recovery is possible when graph signals are bandlimited. The 
sampled signal coefficients then form a new graph signal, 
whose corresponding graph structure is constructed from the 
original graph structure, preserving the first-order difference 
of the original graph signal. We studied a qualified sampling 
operator for both random sampling and experimentally de¬ 
signed sampling. We further established the connection to the 
sampling theory for finite discrete-time signal processing and 
previous works on the sampling theory on graphs, and showed 
how to handle full-band graphs signals by using graph filter 
banks. We showed applications to semi-supervised classifica¬ 
tion of online blogs and digit images, where the proposed 
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sampling and interpolation operators perform competitively. 
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